{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CNN\n",
    "卷积神经网络（convolutional neural network）是含有卷积层（convolutional layer）的神经网络。本章中介绍的卷积神经网络均使用最常见的二维卷积层。它有高和宽两个空间维度，常用来处理图像数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import sys\n",
    "sys.path.append(\"..\") "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "先计算 应用 滤波器 f 应用于 X矩阵 后， 产生的输出的大小h, w\n",
    "然后分别滑动窗口计算 每一个patch 的输出，sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def corr2d(X, kernel):\n",
    "    h, w = kernel.shape\n",
    "    out = torch.zeros(X.size()[0] - h + 1, X.size()[1] - w + 1 )\n",
    "    for i in range(out.shape[0]):\n",
    "        for j in range(out.shape[1]):\n",
    "            out[i, j] = (X[i : i+h, j : j+w] *kernel).sum()\n",
    "    return out\n",
    "    \n",
    "X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])\n",
    "K = torch.tensor([[0, 1], [1, 0]])\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[ 4.,  6.],\n",
       "        [10., 12.]])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "corr2d(X,K)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ## 二维卷积层 Pytorch\n",
    " \n",
    " \n",
    " 卷积窗口形状为$p \\times q$的卷积层称为$p \\times q$卷积层。同样，$p \\times q$卷积或$p \\times q$卷积核说明卷积核的高和宽分别为$p$和$q$。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Conv2D(nn.Module):\n",
    "    def __init__(self, kernel_size):\n",
    "        super(Conv2D, self).__init__()\n",
    "        self.weight = nn.Parameter(torch.randn(kernel_size))\n",
    "        self.bias = nn.Parameter(torch.randn(1))\n",
    "\n",
    "    def forward(self, x):\n",
    "        return corr2d(x, self.weight) + self.bias"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面我们来看一个卷积层的简单应用：检测图像中物体的边缘，即找到像素变化的位置。首先我们构造一张$6\\times 8$的图像（即高和宽分别为6像素和8像素的图像）。它中间4列为黑（0），其余为白（1）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.float32\n"
     ]
    }
   ],
   "source": [
    "image = torch.ones(6,8)\n",
    "image[:,2:6] = 0\n",
    "print(image.dtype)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from PIL import Image\n",
    "import numpy as np\n",
    "def show_numpy_GRAY(image):\n",
    "    data = image.numpy().astype(np.uint8)\n",
    "    data *= 255\n",
    "    img = Image.fromarray(data, 'L')\n",
    "    img.show()\n",
    "show_numpy_GRAY(image)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "K = torch.tensor([[1.0, -1]])\n",
    "Y = corr2d(image, K)\n",
    "show_numpy_GRAY(Y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 通过数据学习核数组\n",
    "\n",
    "最后我们来看一个例子，它使用物体边缘检测中的输入数据`X`和输出数据`Y`来学习我们构造的核数组`K`。我们首先构造一个卷积层，其卷积核将被初始化成随机数组。接下来在每一次迭代中，我们使用平方误差来比较`Y`和卷积层的输出，然后计算梯度来更新权重。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.float32\n",
      "weight:  tensor([[ 0.1557, -1.3045]])\n",
      "bias:  tensor([0.4842])\n",
      "Step 5, loss 1.811\n",
      "Step 10, loss 0.255\n",
      "Step 15, loss 0.043\n",
      "Step 20, loss 0.009\n",
      "Step 25, loss 0.002\n",
      "Step 30, loss 0.001\n",
      "weight:  tensor([[ 0.9934, -0.9949]])\n",
      "bias:  tensor([0.0008])\n"
     ]
    }
   ],
   "source": [
    "# 构造一个核数组形状是(1, 2)的二维卷积层\n",
    "CONV2D = Conv2D(kernel_size=(1, 2))\n",
    "K = torch.tensor([[1, -1]], dtype = torch.float)\n",
    "print(K.dtype)\n",
    "Y = corr2d(image, K)\n",
    "\n",
    "step = 30\n",
    "lr = 0.01\n",
    "\n",
    "print(\"weight: \", CONV2D.weight.data)\n",
    "print(\"bias: \", CONV2D.bias.data)\n",
    "\n",
    "for i in range(step):\n",
    "    Y_hat = CONV2D(image)\n",
    "    \n",
    "    l = ((Y_hat - Y) ** 2).sum()\n",
    "    l.backward()\n",
    "    \n",
    "    CONV2D.weight.data -= lr * CONV2D.weight.grad\n",
    "    CONV2D.bias.data -= lr * CONV2D.bias.grad\n",
    "    \n",
    "    # 梯度清0\n",
    "    CONV2D.weight.grad.fill_(0)\n",
    "    CONV2D.bias.grad.fill_(0)\n",
    "    if (i + 1) % 5 == 0:\n",
    "        print('Step %d, loss %.3f' % (i + 1, l.item()))\n",
    "        \n",
    "print(\"weight: \", CONV2D.weight.data)\n",
    "print(\"bias: \", CONV2D.bias.data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到，学到的卷积核的权重参数与我们之前定义的核数组`K`较接近，而偏置参数接近0。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "weight:  tensor([[ 0.9934, -0.9949]])\n",
      "bias:  tensor([0.0008])\n"
     ]
    }
   ],
   "source": [
    "print(\"weight: \", CONV2D.weight.data)\n",
    "print(\"bias: \", CONV2D.bias.data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 互相关运算和卷积运算\n",
    "## 标准的卷积操作是从右下角开始，从右向左，从下到上，依次相乘。\n",
    "\n",
    "## 深度学习中的卷积，其实是互相关，是直观的对应相乘求和操作。\n",
    "\n",
    "## 互相关运算和卷积运算\n",
    "\n",
    "实际上，卷积运算与互相关运算类似。**为了得到卷积运算的输出，我们只需将核数组左右翻转并上下翻转，再与输入数组做互相关运算**。可见，卷积运算和互相关运算虽然类似，但如果它们使用相同的核数组，对于同一个输入，输出往往并不相同。\n",
    "\n",
    "那么，你也许会好奇卷积层为何能使用互相关运算替代卷积运算。其实，在深度学习中核数组都是学出来的：卷积层无论使用互相关运算或卷积运算都不影响模型预测时的输出。为了解释这一点，假设卷积层使用互相关运算学出图5.1中的核数组。设其他条件不变，使用卷积运算学出的核数组即图5.1中的核数组按上下、左右翻转。也就是说，图5.1中的输入与学出的已翻转的核数组再做卷积运算时，依然得到图5.1中的输出。为了与大多数深度学习文献一致，如无特别说明，本书中提到的卷积运算均指互相关运算。\n",
    "\n",
    "> 注：感觉深度学习中的卷积运算实际上是互相关运算是个面试题考点。\n",
    "\n",
    "## 5.1.6 特征图和感受野\n",
    "\n",
    "二维卷积层输出的二维数组可以看作是输入在空间维度（宽和高）上某一级的表征，也叫特征图（feature map）。影响元素$x$的前向计算的所有可能输入区域（可能大于输入的实际尺寸）叫做$x$的感受野（receptive field）。以图5.1为例，输入中阴影部分的四个元素是输出中阴影部分元素的感受野。我们将图5.1中形状为$2 \\times 2$的输出记为$Y$，并考虑一个更深的卷积神经网络：将$Y$与另一个形状为$2 \\times 2$的核数组做互相关运算，输出单个元素$z$。那么，$z$在$Y$上的感受野包括$Y$的全部四个元素，在输入上的感受野包括其中全部9个元素。可见，我们可以通过更深的卷积神经网络使特征图中单个元素的感受野变得更加广阔，从而捕捉输入上更大尺寸的特征。\n",
    "\n",
    "我们常使用“元素”一词来描述数组或矩阵中的成员。在神经网络的术语中，这些元素也可称为“单元”。当含义明确时，本书不对这两个术语做严格区分。\n",
    "\n",
    "## 小结\n",
    "\n",
    "- 二维卷积层的核心计算是二维互相关运算。在最简单的形式下，它对二维输入数据和卷积核做互相关运算然后加上偏差。\n",
    "- 我们可以设计卷积核来检测图像中的边缘。\n",
    "- 我们可以通过数据来学习卷积核。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 填充和步幅\n",
    "\n",
    "在上一节的例子里，我们使用高和宽为3的输入与高和宽为2的卷积核得到高和宽为2的输出。一般来说，假设输入形状是$n_h\\times n_w$，卷积核窗口形状是$k_h\\times k_w$，那么输出形状将会是\n",
    "\n",
    "$$(n_h-k_h+1) \\times (n_w-k_w+1).$$\n",
    "\n",
    "所以卷积层的输出形状由输入形状和卷积核窗口形状决定。本节我们将介绍卷积层的两个超参数，即填充和步幅。它们可以对给定形状的输入和卷积核改变输出形状。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([8, 8])\n"
     ]
    }
   ],
   "source": [
    "X=torch.randn(8,8)\n",
    "print(X.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1, 1, 8, 8)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(1,1)+X.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([8, 8])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 注意这里是两侧分别填充1行或列，所以在两侧一共填充2行或列\n",
    "conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=1)\n",
    "# 是不行的，X 是一个元素，8*8\n",
    "# pytorch 的 输出必须是一个四维 tensor batch_size, channels, H, W\n",
    "X=torch.randn(8,8)\n",
    "X = X.view((1,1)+X.shape)\n",
    "Y = conv2d(X)\n",
    "Y.shape[2:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([8, 8])"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Y.shape[2:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 步幅\n",
    "\n",
    "在上一节里我们介绍了二维互相关运算。卷积窗口从输入数组的最左上方开始，按从左往右、从上往下的顺序，依次在输入数组上滑动。我们将每次滑动的行数和列数称为步幅（stride）。\n",
    "\n",
    "一般来说，当高上步幅为$s_h$，宽上步幅为$s_w$时，输出形状为\n",
    "\n",
    "$$\\lfloor(n_h-k_h+2*p_h+s_h  )/s_h\\rfloor \\times \\lfloor(n_w-k_w+ 2*p_w+s_w )/s_w\\rfloor.$$\n",
    "\n",
    "如果设置$p_h=k_h-1$和$p_w=k_w-1$，那么输出形状将简化为$\\lfloor(n_h+s_h-1)/s_h\\rfloor \\times \\lfloor(n_w+s_w-1)/s_w\\rfloor$。更进一步，如果输入的高和宽能分别被高和宽上的步幅整除，那么输出形状将是$(n_h/s_h) \\times (n_w/s_w)$。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([4, 4])"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conv2d = nn.Conv2d(1, 1, kernel_size=3, padding=1, stride=2)\n",
    "\n",
    "X=torch.randn(8,8)\n",
    "X = X.view((1,1)+X.shape)\n",
    "Y = conv2d(X)\n",
    "Y.shape[2:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "计算过程\n",
    "$$\n",
    "{(h - k_h + 2*padding_h   + stride_h )/ stride_h}  \\times {(w - k_w+ 2*padding_w  +  stride_w )/ stride_w }\n",
    "$$\n",
    "\n",
    "$$\n",
    "{(8-3+1*2+2)/2} \\times {(8-3+1*2+2)/2} = 4\\times 4\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([2, 2])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conv2d = nn.Conv2d(1, 1, kernel_size=(5, 5), padding=(0, 1), stride=(3, 4))\n",
    "\n",
    "X=torch.randn(8,8)\n",
    "X = X.view((1,1)+X.shape)\n",
    "Y = conv2d(X)\n",
    "Y.shape[2:]\n",
    "\n",
    "#当计算公式为小数时，向下取整，因为最后分数次是无法卷积的/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "{(8-5+2*0 +3)/3} \\times {(8-5+2*1+4)/4} = 2\\times 2\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了表述简洁，当输入的高和宽两侧的填充数分别为$p_h$和$p_w$时，我们称填充为$(p_h, p_w)$。特别地，当$p_h = p_w = p$时，填充为$p$。当在高和宽上的步幅分别为$s_h$和$s_w$时，我们称步幅为$(s_h, s_w)$。特别地，当$s_h = s_w = s$时，步幅为$s$。在默认情况下，填充为0，步幅为1。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 多通道\n",
    "## 多通道输入"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[0, 1, 2],\n",
      "        [3, 4, 5],\n",
      "        [6, 7, 8]]) tensor([[0, 1],\n",
      "        [2, 3]])\n",
      "tensor([[1, 2, 3],\n",
      "        [4, 5, 6],\n",
      "        [7, 8, 9]]) tensor([[1, 2],\n",
      "        [3, 4]])\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "tensor([[ 56.,  72.],\n",
       "        [104., 120.]])"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import torch\n",
    "from torch import nn\n",
    "\n",
    "\n",
    "def corr2d_multi_in(X, K):\n",
    "    # 沿着X和K的第0维（通道维）分别计算再相加\n",
    "    print(X[0, :, :], K[0, :, :])\n",
    "    res = corr2d(X[0, :, :], K[0, :, :])\n",
    "    #接下来我们实现含多个输入通道的互相关运算。我们只需要对每个通道做互相关运算，然后进行累加。\n",
    "    for i in range(1, X.shape[0]):\n",
    "        print(X[i, :, :], K[i, :, :])\n",
    "        res += corr2d(X[i, :, :], K[i, :, :])\n",
    "    return res\n",
    "\n",
    "X = torch.tensor([[[0, 1, 2], [3, 4, 5], [6, 7, 8]],\n",
    "              [[1, 2, 3], [4, 5, 6], [7, 8, 9]]])\n",
    "\n",
    "K = torch.tensor([[[0, 1], [2, 3]], [[1, 2], [3, 4]]])\n",
    "\n",
    "corr2d_multi_in(X, K)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 多通道输出\n",
    "\n",
    "torch.stack 堆叠 增加新的维度进行堆叠"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[ 0.0219,  0.2842, -1.5298,  0.4824],\n",
      "        [ 0.7153,  0.4799,  0.4699, -1.2945],\n",
      "        [ 0.1387,  0.9134, -0.3478, -1.1283]])\n",
      "tensor([[-0.3726,  2.1162,  0.2183,  0.9948],\n",
      "        [-1.3020,  1.0739,  0.3250,  0.8928],\n",
      "        [ 0.3016, -1.2595, -1.3800, -2.0265]])\n",
      "tensor([[[ 0.0219,  0.2842, -1.5298,  0.4824],\n",
      "         [ 0.7153,  0.4799,  0.4699, -1.2945],\n",
      "         [ 0.1387,  0.9134, -0.3478, -1.1283]],\n",
      "\n",
      "        [[-0.3726,  2.1162,  0.2183,  0.9948],\n",
      "         [-1.3020,  1.0739,  0.3250,  0.8928],\n",
      "         [ 0.3016, -1.2595, -1.3800, -2.0265]]])\n",
      "tensor([[[ 0.0219,  0.2842, -1.5298,  0.4824],\n",
      "         [-0.3726,  2.1162,  0.2183,  0.9948]],\n",
      "\n",
      "        [[ 0.7153,  0.4799,  0.4699, -1.2945],\n",
      "         [-1.3020,  1.0739,  0.3250,  0.8928]],\n",
      "\n",
      "        [[ 0.1387,  0.9134, -0.3478, -1.1283],\n",
      "         [ 0.3016, -1.2595, -1.3800, -2.0265]]])\n",
      "tensor([[[ 0.0219, -0.3726],\n",
      "         [ 0.2842,  2.1162],\n",
      "         [-1.5298,  0.2183],\n",
      "         [ 0.4824,  0.9948]],\n",
      "\n",
      "        [[ 0.7153, -1.3020],\n",
      "         [ 0.4799,  1.0739],\n",
      "         [ 0.4699,  0.3250],\n",
      "         [-1.2945,  0.8928]],\n",
      "\n",
      "        [[ 0.1387,  0.3016],\n",
      "         [ 0.9134, -1.2595],\n",
      "         [-0.3478, -1.3800],\n",
      "         [-1.1283, -2.0265]]])\n"
     ]
    }
   ],
   "source": [
    "data1 = torch.randn(3,4)\n",
    "print(data1)\n",
    "data2 = torch.randn(3,4)\n",
    "print(data2)\n",
    "\n",
    "# 增加一个维度， 两个tensor 堆叠\n",
    "print(torch.stack([data1,data2]))\n",
    "\n",
    "print(torch.stack([data1,data2], dim =1))\n",
    "\n",
    "print(torch.stack([data1,data2], dim =2))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([3, 2, 2, 2])\n",
      "tensor([[0, 1, 2],\n",
      "        [3, 4, 5],\n",
      "        [6, 7, 8]]) tensor([[0, 1],\n",
      "        [2, 3]])\n",
      "tensor([[1, 2, 3],\n",
      "        [4, 5, 6],\n",
      "        [7, 8, 9]]) tensor([[1, 2],\n",
      "        [3, 4]])\n",
      "tensor([[0, 1, 2],\n",
      "        [3, 4, 5],\n",
      "        [6, 7, 8]]) tensor([[1, 2],\n",
      "        [3, 4]])\n",
      "tensor([[1, 2, 3],\n",
      "        [4, 5, 6],\n",
      "        [7, 8, 9]]) tensor([[2, 3],\n",
      "        [4, 5]])\n",
      "tensor([[0, 1, 2],\n",
      "        [3, 4, 5],\n",
      "        [6, 7, 8]]) tensor([[2, 3],\n",
      "        [4, 5]])\n",
      "tensor([[1, 2, 3],\n",
      "        [4, 5, 6],\n",
      "        [7, 8, 9]]) tensor([[3, 4],\n",
      "        [5, 6]])\n",
      "tensor([[[ 56.,  72.],\n",
      "         [104., 120.]],\n",
      "\n",
      "        [[ 76., 100.],\n",
      "         [148., 172.]],\n",
      "\n",
      "        [[ 96., 128.],\n",
      "         [192., 224.]]])\n"
     ]
    }
   ],
   "source": [
    "K = torch.stack([K, K + 1, K + 2])\n",
    "def coor2d_muliti_in_out(X,K):\n",
    "    out = []\n",
    "    for k in K:\n",
    "        out.append(corr2d_multi_in(X,k))\n",
    "        #stack 将数组 合并 按维度\n",
    "    return torch.stack(out)\n",
    "\n",
    "\n",
    "\n",
    "print(K.shape) # torch.Size([3, 2, 2, 2])\n",
    "\n",
    "\n",
    "out = coor2d_muliti_in_out(X, K)\n",
    "print(out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1X1 卷积\n",
    "\n",
    "### 其实可以考虑为一个信道压缩的方式， 输出大小不变，通道减少，将一个通道看作一个特征，然后进行压缩？\n",
    "最后我们讨论卷积窗口形状为$1\\times 1$（$k_h=k_w=1$）的多通道卷积层。我们通常称之为$1\\times 1$卷积层，并将其中的卷积运算称为$1\\times 1$卷积。因为使用了最小窗口，$1\\times 1$卷积失去了卷积层可以识别高和宽维度上相邻元素构成的模式的功能。实际上，$1\\times 1$卷积的主要计算发生在通道维上。图5.5展示了使用输入通道数为3、输出通道数为2的$1\\times 1$卷积核的互相关计算。值得注意的是，输入和输出具有相同的高和宽。输出中的每个元素来自输入中在高和宽上相同位置的元素在不同通道之间的按权重累加。假设我们将通道维当作特征维，将高和宽维度上的元素当成数据样本，**那么$1\\times 1$卷积层的作用与全连接层等价**。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 小结\n",
    "\n",
    "- 使用多通道可以拓展卷积层的模型参数。\n",
    "- 假设将通道维当作特征维，将高和宽维度上的元素当成数据样本，那么$1\\times 1$卷积层的作用与全连接层等价。\n",
    "- $1\\times 1$卷积层通常用来调整网络层之间的通道数，并控制模型复杂度。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 池化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "from torch import nn\n",
    "\n",
    "def pool2d(X, pool_size, mode='max'):\n",
    "    X = X.float()\n",
    "    p_h, p_w = pool_size\n",
    "    Y = torch.zeros(X.shape[0] - p_h + 1, X.shape[1] - p_w + 1)\n",
    "    for i in range(Y.shape[0]):\n",
    "        for j in range(Y.shape[1]):\n",
    "            if mode == 'max':\n",
    "                Y[i, j] = X[i: i + p_h, j: j + p_w].max()\n",
    "            elif mode == 'avg':\n",
    "                Y[i, j] = X[i: i + p_h, j: j + p_w].mean()       \n",
    "    return Y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[4., 5.],\n",
       "        [7., 8.]])"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])\n",
    "pool2d(X, (2, 2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[2., 3.],\n",
       "        [5., 6.]])"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pool2d(X, (2, 2), 'avg')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 填充和步幅\n",
    "\n",
    "同卷积层一样，池化层也可以在输入的高和宽两侧的填充并调整窗口的移动步幅来改变输出形状。池化层填充和步幅与卷积层填充和步幅的工作机制一样。我们将通过`nn`模块里的二维最大池化层`MaxPool2d`来演示池化层填充和步幅的工作机制。我们先构造一个形状为(1, 1, 4, 4)的输入数据，前两个维度分别是批量和通道。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = torch.arange(16, dtype=torch.float).view((1, 1, 4, 4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[[[ 0.,  1.,  2.,  3.],\n",
      "          [ 4.,  5.,  6.,  7.],\n",
      "          [ 8.,  9., 10., 11.],\n",
      "          [12., 13., 14., 15.]]]])\n"
     ]
    }
   ],
   "source": [
    "print(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "默认情况下，`MaxPool2d`实例里步幅和池化窗口形状相同。下面使用形状为(3, 3)的池化窗口，默认获得形状为(3, 3)的步幅。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[ 5.,  6.,  7.],\n",
      "        [ 9., 10., 11.],\n",
      "        [13., 14., 15.]])\n",
      "tensor([[[[10.]]]])\n"
     ]
    }
   ],
   "source": [
    "print(pool2d(X.view(-1,4),(2,2)))\n",
    "pool2d = nn.MaxPool2d(3)\n",
    "print(pool2d(X)) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[[[ 5.,  7.],\n",
       "          [13., 15.]]]])"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pool2d = nn.MaxPool2d(3, padding=1, stride=2)\n",
    "pool2d(X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[[[ 1.,  3.],\n",
       "          [ 9., 11.],\n",
       "          [13., 15.]]]])"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pool2d = nn.MaxPool2d((2, 4), padding=(1, 2), stride=(2, 3))\n",
    "pool2d(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  多通道\n",
    "\n",
    "在处理多通道输入数据时，**池化层对每个输入通道分别池化，而不是像卷积层那样将各通道的输入按通道相加**。这意味着池化层的输出通道数与输入通道数相等。下面将数组`X`和`X+1`在通道维上连结来构造通道数为2的输入。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[[[ 0.,  1.,  2.,  3.],\n",
       "          [ 4.,  5.,  6.,  7.],\n",
       "          [ 8.,  9., 10., 11.],\n",
       "          [12., 13., 14., 15.]],\n",
       "\n",
       "         [[ 1.,  2.,  3.,  4.],\n",
       "          [ 5.,  6.,  7.,  8.],\n",
       "          [ 9., 10., 11., 12.],\n",
       "          [13., 14., 15., 16.]]]])"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = torch.cat((X, X + 1), dim=1)\n",
    "X"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[[[ 5.,  7.],\n",
       "          [13., 15.]],\n",
       "\n",
       "         [[ 6.,  8.],\n",
       "          [14., 16.]]]])"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pool2d = nn.MaxPool2d(3, padding=1, stride=2)\n",
    "pool2d(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 小结\n",
    "\n",
    "- 最大池化和平均池化分别取池化窗口中输入元素的最大值和平均值作为输出。\n",
    "- 池化层的一个主要作用是缓解卷积层对位置的过度敏感性。\n",
    "- 可以指定池化层的填充和步幅。\n",
    "- 池化层的输出通道数跟输入通道数相同。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
